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skill  items.  The  examinee  responded  directly  to  the  face  of  the  television 
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FOREWORD 


Research  on  improvinq  performance  both  of  qroups  of  soldiers  func- 
tioninq  as  a crew  and  of  individual  soldiers  forms  a major  proqram  at 
the  Army  Research  Institute  for  the  Behavioral  and  Social  Sciences  (ARI). 

Hie  Performance-Oriented  Individual  Skill  Development  and  Evaluation 
project  is  concerned  with  improvinq  relevance,  efficiency,  and  economy 
of  individual  enlisted  traininq  and  evaluation.  In  a major  move  to 
improve  the  coml^at  readiness  of  soldiers,  the  Army  is  implementinq  the 
Enlisted  Personnel  Manaqement  System  (EPMS) . This  system  requires 
restructurinq  the  individual  traininq  and  testing  systems  to  make  them 
job  relevant. 

Evaluation  of  the  individual  soldiers  for  career  proqression  in  the 
EPMS  is  based  on  criterion-reference  performance  testing  of  actual  job 
skills  rather  than  generalized  knowledge.  In  an  effort  to  achieve  more 
economy  in  the  large-scale  testing  required  for  the  EPMS,  a research 
proqram  seeking  to  develop  sim\ilated  performance  tests  has  been  ini- 
tiated. One  promising  line  of  endeavor  is  the  use  of  various  audio- 
visual media  to  provide  the  stimulus  input  and  job  setting  for  the 
skill  items.  The  present  publication  reports  the  results  of  a study 
investigating  tlie  use  of  television  stimulus  inputs  in  conjunction  with 
an  electronic  responding  vehicle  which  requires  real-time  decisions  and 
responses. 

This  research  was  done  at  the  ARI  Field  Unit  at  Fort  Ki\ox,  Kv.  , in 
response  to  requirements  of  Army  Project  2Q763731A770  and  to  special 
requirements  of  the  Traininq  and  Doctrine  Command  (TRADOC) , Fort  Monroe,  Va. 


TELEVISION  AS  STIMULUS  INPUT  IN  SYNTHETIC  PERFORMANCE  TESTING 


BRIEF 


Requirement : 

To  investigate  the  validity  and  feasibility  of  using  television 
stimulus  inputs  in  a synthetic  performance  test,  and  to  determine  if 
such  tests  can  replace  hands-on  performance  tests.  This  research  is 
considered  necessary  because  of  the  high  cost  of  hands-on  tests  and 
the  need  to  develop  a less  expensive,  reasonably  valid  substitute. 


Procedure : 

A synthetic  performance  test  using  television  as  the  stimulus  input 
was  developed  and  produced.  The  test  was  considered  a performance 
test  because  the  items  covered  the  actual  tasks  the  examinees  were 
required  to  perform  on  the  job.  The  test  was  administered  to  70  soldier- 
trainees  who  had  completed  advanced  training  in  the  subject  matter. 

Scores  made  by  these  same  trainees  on  a hands-on  performance  test 
which  had  similar  items  were  also  obtained.  The  hands-on  test  was 
administered  routinely  by  the  Army  to  all  trainees  at  the  end  of  the 
advanced  training.  A parallel  paper-and-pencil  test  was  administered 
to  64  soldier-trainees,  and  hands-on  scores  were  also  obtained  for 
these  trainees. 


Findings : 

The  results  favored  the  feasibility  of  television  testing.  The 
test  was  produced  and  administered  without  difficulty,  and  the  examinees 
had  a very  favorcQale  attitude.  The  examinees  had  no  trouble  under- 
standing and  responding  to  the  items.  The  examinees  judged  the  test 
as  "fair"  (impartial)  in  terms  of  testing  them  on  important  tasks  they 
should  have  mastered. 

The  validity  of  the  results  was  inconclusive.  The  criterion  scores 
for  the  hands-on  test  were  unsatisfactory  in  that  most  examinees  made 
a perfect  score.  The  correlation  between  the  television  and  hands-on 
tests  was  low-positive  but  nonsignificant.  Comparison  between  the 
television  and  parallel  paper-and-pencil  tests  also  showed  no  overall 
difference,  although  there  were  significant  differences  between  many 
items. 


utilisation  of  Findinqs: 

Hiis  stvidy  provides  insufficient  evidence  to  conclude  that  syn- 
thetic performance  tests  with  television  inputs  can  replace  hands-on 
performance  tests.  To  determine  more  precisely  whether  television 
testing  has  promise  requires  the  developnu'nt  of  a more  satisfactory 
hands-on  criterion  test  and  a more  thorough  examination  of  those  tasks 
and  response  components  that  appear  most  amenable  to  television  testing. 
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INTRODUCTION 

Over  the  past  10  years  or  so,  the  Army  has  tried  to  convert  more  of 
its  testing  to  the  "hands-on"  performance  mode,  especially  at  training 
centers  and  at  the  beginning  skill  levels.  Even  more  emphasis  has  been 
placed  on  performance  testing  in  the  last  2 or  3 years  with  the  begin- 
ning of  the  Skill  Qualification  Testing  (SQT)  program.  Performance 
testing  is  highly  desirable  because  of  its  high  face  validity  and  high 
user  acceptability;  however,  this  type  of  testing  is  very  costly,  hard 
to  standardize,  and  often  not  feasible. 

The  alternative  to  hands-on  performance  testing  has  generally  been 
the  standard,  group-administered,  knowledge-type,  paper-and-pencil 
test.  Although  relatively  easy  to  produce  and  administer,  this  type 
of  test  is  generally  considered  to  have  low  validity  and  low  user 
acceptability. 

Osborn  (1970)  has  suggested  that  a compromise  validity-feasibility 
tradeoff  point  might  be  reached  by  using  synthetic  performance  tests. 
According  to  Osborn,  the  term  "synthetic  performance  test"  refers  to 
any  performance  test  that  is  less  than  a full  hands-on  test,  but  more 
than  the  group- administered,  knowledge-type,  paper-and-pencil  test. 
Synthetic  performance  tests  include  all  tests  that  use  any  type  of 
simulated  inputs  or  responses.  Part-task  tests,  in  which  only  one  or 
a few  response  components  of  a task  are  measured,  are  also  included 
under  synthetic  performance  tests.  The  synthetic  performance  test 
is  conceived  as  less  costly  than  a hands-on  test,  but  as  a test  that 
still  has  reasonable  validity  and  user  acceptability. 

To  support  the  Army's  adoption  of  performance  testing,  the  U.S. 

Army  Research  Institute  has  initiated  a broad-based  research  program 
to  investigate  the  possibilities  of  synthetic  performance  testing  as  a 
cost-effective  alternative  to  the  usual  hands-on  procedures.  The  goal 
of  this  research  is  to  develop  a psychometric  base  for  both  hands-on 
and  synthetic  methods. 

The  research  focus  has  been  on  the  use  of  audiovisual  media  to  pro- 
vide the  simulated  stimulus  input.  The  reasoning  behind  this  focus  is 
that  audiovisual  media  stand  midway  in  the  stimulus  fidelity  range,  and  ^ 
at  the  Scune  time,  are  at  the  medium  to  high  end  of  the  feasibility  scale. 
Thus,  audiovisual  media  may  represent  a good  fidelity-feasibility  tradeoff 


Stimulus  fidelity  as  used  here  refers  to  how  closely  the  test  stimulus 
resembles  the  real  world,  and  stimulus  feasibility  refers  to  how  much 
it  costs  to  present  the  test  stimulus  in  a testing  situation  (high 
feasibility  equals  low  cost) . 
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Figure  1.  Conception  of  stimulus  fidelity  and  feasibility  tradeoff. 


{.Hjint  insofar  as  stimulus  input  is  oonrernpd.  Fiqurn  1 shows  a I'onoop- 
tion  of  tliis  f i<ipl  i ty- fpas  ibi  1 i t y tradooff. 

Tho  overall  research  program  has  the  following  ob;iertives! 

1.  IVi  explore  tlie  parameters  of  tile  various  audiovisual  media  to 
determine  the  media’s  applicability  to  synthetic  p«'rformance  testing. 

2.  To  explore  variovts  res|xindlng  irKxles  aiid  response  devices  that 
cai\  be  used  with  audiovisvial  st  imvilus  inputs. 

1.  To  determitte  whether  those  response  components  of  a task  th.at 
can  be  measured  using  audiovisvtal  media  are  svifficient  to  yield  an 
acceptable  measure  of  tlie  entire  t.ask. 

4.  To  develop  a task  classification  system  that  will  enable  a 
synthetic  performance  test  developer  to  determine  by  analysing  the  task 
(a)  when  audiovisual  media  slionld  be  used  as  the  stlmulvts  input,  (b) 
which  medium  is  advisable,  and  (c)  which  resiKinse  v'omivnients  should 
lie  measured. 

Several  experiments  in  this  research  program  are  now  in  process 
using  a numlx'r  of  different  avtdlovisxial  media.  Utis  paivr,  which  is 
concerned  with  televlsloti  as  the  stimulus  input,  presents  the  tesults  of 
the  first  of  these  experiments. 

Tills  first  exi'eriment  was  limited  in  nattire  and  focused  on  the 
feasibility  of  using  television  as  the  stimulus  input.  As  such  it  was 
concerned  mostly  with  the  first  research  obiect lve--appl Icabi 1 it y I'f 
media  to  testing--wlth  some  exploration  into  the  seixitnl  and  third 
obiectlves,  resfionses  to  stimuli  and  test-task  comparisons. 


Background  aji^ 

Ttie  impetus  for  this  research  stems  from  the  Aimy's  declsloti  to 
sulistltute  the  Skill  Qual  i float  ioti  Testing  (StJT)  program  for  the  current 
MOS  testing  program  as  a means  of  assessing  the  iob  skills  of  enlisted 
jiersotuie  I . The  SQT  program  is  intended  to  I'e  based  on  iob-sample  tests 
wherever  practical,  as  contrasted  to  the  current  MOS  pai'er-and-penci 1 
knowledge  test. 

This  change  was  brought  alxnit  partially  as  a result  of  the  ri’seatch 
of  a number  of  investigators  (F.ngel  , .tuly  l'*7di  Pngel  , iXMv'bet  1'>70| 
F.miel  s Hehder,  l‘t70t  Shirkey,  1'>(,S|  iirry,  Shlrkey,  n Nicewander,  l'»('M 
who  questioned  the  validity  of  the  tktb  test  fo«  iob  skill  assessnx'nt  . 

In  1966  the  Army  ixinvened  a s\>ecial  Ixiard  of  inquiry  (Bti'wn  IV'atd)  to 
survey  the  entire  question  of  written  Mb'S  tests  for  assessing  lob  skills 
and  job  knowledge.  This  Ixiard  recommended  that  I'erformance  tests  be 
substituted  for  written  tests  wherever  practical  (I'.F.  Atmy,  I'MU'l. 
Following  the  publication  of  the  findings  of  the  Brown  Bc>aid,  the  Aimy 
has  made  substantial  progress  in  imi'lenx'nt  tng  t lie  reix'mnx'ndat  ion  (e.g.. 


the  Tank  Crewman  Advanced  Individual  Training  performance  tests  admin- 
istered in  the  form  of  a "county  fair,"  with  examinees  moving  from 
test  to  test  around  the  examination  area,  during  and  at  the  end  of 
each  training  cycle).  However,  due  to  high  costs  and  difficulty  in 
maintaining  standardization,  the  performance  test  obviously  is  limited 
in  terms  of  making  up  a substantial  part  of  each  SQT  test.  This  is 
particularly  true  at  the  higher  skill  levels  and  for  many  hard-to- 
measure  tasks.  Occhialini  (1972),  for  example,  presents  evidence  that 
performance  tests  are  extremely  difficult  to  prepare  and  administer, 
and  are  of  questionable  validity.  Engel  and  Rehder  (1970)  review  the 
arguments  against  the  use  of  performance  tests  for  part  or  all  of  the 
SQT  battery.  Their  general  conclusion  is  that  the  exclusive  use  of 
performance  tests  in  an  SQT  battery  would  be  too  costly  and  impractical. 

Reacting  to  the  pros  and  cons  of  paper-and-penci 1 vs.  performance 
tests,  several  researchers  have  proposed  compromises.  Engel  and  Rolider 
(1970)  advocate  a mixture-of-measurement  technique  in  each  SQT  test, 
combining  work  samples,  simulated  tests,  peer  ratings,  and  paper-and- 
pencil  tests.  They  present  evidence  indicating  that  cognitive  items 
can  be  measured  adequately  by  paper-and-pencil  tests;  that  motor- 
manipulative  items  require  work  sample  or  simulated  tests;  and  that 
peer  ratings  can  be  used  to  judge  social,  leadership,  and  overall 
ability. 

Osborn's  (1970)  approach  is  concerned  with  developing  synthetic 
tests  that  it  is  hoped  will  eliminate  some  of  the  impracticality  of 
administering  performance  tests,  while  reducing  the  verbal  component 
and  improving  the  validity  of  paper-and-pencil  tests.  Osborn  visual- 
izes a continuum  bounded  on  one  extreme  by  paper-and-pencil  knowledge 
tests  and  on  the  other  by  job-sample  skill  tests.  Within  this 
continuum,  a number  of  synthetic  tests  more  or  less  removed  from  each 
extreme  can  be  constructed.  The  continuum  is  conceived  of  as  being 
scaled  in  psychological  units  and  varies  along  the  dimensions  of 
stimulus  fidelity  and  response  fidelity  (or  a mixture  of  botl;)  . 

In  any  combat  situation,  the  stimulus  dimension  would  be  a largo 
complex  composed  of  visual,  auditory,  tactile,  kinesthetic,  olfactory, 
pain,  and  stress  inputs.  The  response  dimension  would  be  an  equally 
large  complex  of  cognitive,  motor-manipulative,  and  perceptual  outputs. 
For  the  purposes  of  illustration,  the  stimulus  and  response  fidelity 
dimensions  for  armor  crewmen  might  be  conceptualized  as  shown  in  Figure 
2.  Osborn  maintains,  in  an  analysis  similar  to  the  one  shown  in  Figure 
2,  that  one  must  pull  away  from  each  extreme  of  the  continuum  to  develop 
synthetic  tests  that  are  both  feasible  and  more  valid  than  paper-an 
pencil  tests. 

An  important  aspect  of  Osborn's  conception  is  hi,>  reasoning  with 
regard  to  part-task  testing  (Osborn  & Ford,  1976).  In  this  conception, 
each  task  is  composed  of  a number  of  response  components  divided  into 
cognitive,  perceptual,  and  motor  behaviors.  Figure  3 shows  a task  broken 
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Stimulus  and  response  fidelity  dimensions. 
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down  into  response  components.  (This  task  is  performed  by  the  loader 
on  an  M60A1  tank.) 


T)ie  reasoning  behind  part-task  testing  is  that  it  may  not  be  neces- 
sary to  test  every  response  com^xinent  in  a particular  task  in  order 
to  determine  how  well  the  whole  task  can  lx?  performed.  It  may  be 
possible  to  get  a good  indication  of  wtiole-task  performance  by  nx'asuriirg 
only  a few  response  components  or  j>erhaps  m«*asnring  only  one  critical 
response  comiionent. 

Part-task  testing  becomes  crucial  wl?en  audiovisual  stimulus  itgnits 
are  used  because  the  nature  of  tlie  medium  precludes  obtaining  any 
measurements  on  most  motor-response  comixanents.  In  order  to  obtain 
measurements  on  motor-response  comyxinents  one  needs  to  test  on  real 
equipment  or  a hands-on  simulator.  Since  the  m«>asurable  response 
components  in  audiovisual  simulation  are  limited  to  perceptual  and 
cognitive  ones,  it  follows  that  the  usefulness  of  audiovisual  stimulus 
inputs  is  dependent  upon  the  validity  of  the  part-task  testing  concept. 
One  objective  of  the  research  program  is  to  check  the  part-task  testing 
concept  by  correlating  scores  made  on  part -tasks  using  audiovisual 
stimulus  Inputs  with  scores  m.tdo  on  the  corresponding  whole  task  tested 
in  the  h.-»nds-on  mode. 

Use  of  Television  in  Testing.  Television  has  been  used  in  testing 
primarily  as  a recording  medium  (Cockrell,  1974;  Hays  Pulliam,  1974). 

A study  by  Shriver  (Shriver,  Hayes,  s Hufliand,  1974)  explored  the 
possibilities  of  using  television  as  the  stimulus  input  in  a perfor- 
mance test.  After  developing  the  test,  Shriver  concluded  that  televi- 
sion did  not  offer  much  promise  in  terms  of  replacing  l\ands-on  testing. 
He  listed  eight  disadvantages  of  the  television  medium  and  decided  to 
abandot;  the  metliotl  and  not  attempt  a systematic  comparison  between  the 
television  test  and  hands-on  performance  tests.  Some  of  the  disad- 
v.intagbs  nvntloned  follow: 

1.  Television  tests  place  the  subject  in  a passive  role,  watching 
someone  else  perform  and  evaluating  the  correctness  of  the  performance. 
Tlx're  is  no  reason  to  lx>lieve  that  success  in  this  evaluation  role  will 
Insure  success  in  the  active  role  of  performing  tlie  task. 

2.  Television  violates  a major  ground  rule  of  criterion-referenced 
testing  in  tliaf  it  empliaslzes  process  measurement  rather  than  product 
measuremt'nt. 

3.  Television  costs  are  very  high  compared  to  those  of  slides  or 
graphics  because  of  the  large  amount  of  equipment  needed  and  the  large 
personnel  time  requirements. 

Shriver' s criticisms  are  Informative,  but  they  do  not  necessarily 
settle  the  case.  The  nature  of  the  medixim  does  include  some  practical 
difficulties  both  in  producing  the  stimulus  tapes  and  in  administering 
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the  tests.  However,  these  difficulties  are  minor  compared  to  the  complex 
task  of  administering  hands-on  performance  tests.  If  television  can  show 
a useful  correlation  with  job-sample  tests  and  also  show  advantages  over 
written  and  other  audiovisual  tests,  it  may  well  be  worth  the  extra  cost. 


Objectives 

The  primary  objectives  of  the  present  experiment  were  to  appraise 
some  of  the  practical  difficulties  in  using  television  as  the  stimulus 
input  and  to  make  a rough  comparison  among  television,  paper-and-pencil , 
and  hands-on  performance  tests.  The  secondary  objective  was  to  conduct 
a checkout  of  a responding  device  (Telestrator)  designed  to  permit 
examinees  to  respond  directly  to  images  on  a screen  (see  Appendix  C) . 

Specifically,  the  objectives  were  as  follows: 

1.  Determine  the  feasibility  of  using  television  in  testing.  The 
items  under  consideration  here  were 

a.  Understandability  of  test  items 

b.  Ease  of  responding 

c.  Time  allotment  for  responding 

d.  Difficulties  and  costs  involved  with  administering  tele- 
vision tests. 

2.  Determine  the  acceptability  of  television  testing  by  examinees. 

3.  Compare  the  results  made  on  the  television  test  with  those  made 
on  the  piaper-and-pencil  and  hands-on  perfornvince  tests. 

4.  Conduct  a checkout  of  the  Telestrator  respxjnse  device. 


METHOD 

The  overall  method  consisted  of  (1)  producing  a television  test  for 
a sample  of  tasks  from  the  job  field  of  tank  crewman  (HE  MOS)  , (2)  j'vro- 
ducing  a parallel  paper-and-pencil  test  covering  the  same  items,  and  (3) 
comparing  the  results  made  on  those  two  tests  witn  the  results  made  on 
an  existing  hands-on  performance  test  that  covered  many  of  the  same  items. 

The  job  field  of  tank  crewman  was  selected  because  much  f^rior 
research  had  been  done  in  this  field.  A complete  task  analysis  was 
available,  and  a hands-on  performance  tost  has  been  in  use  for  the  Tank 
Crewman  Advanced  Individual  Training  course  for  2 years.  This  existing 
hands-on  pxjrformance  test  was  felt  to  be  a good  base  against  which  to 
compare  the  television  and  p^ap'ier-and-pu'nci  1 tests. 


B 


Television  Test 


The  first  step  in  producing  the  television  tape  was  to  select  the 
critical  tasks  in  consultation  with  military  experts.  The  selection 
criteria  were  set  by  the  military  and  included  such  considerations  as 
importance  to  fulfilling  the  mission,  safety  to  the  crewman,  and  safety 
to  the  equipment.  The  critical  tasks  selected  were  quite  similar  to 
the  tasks  covered  in  the  Tank  Crewman  Advanced  Individual  Training 
course.  After  the  critical  tasks  were  selected,  they  were  ordered 
according  to  skill  level. 

For  the  final  test,  tasks  were  selected  from  skill  levels  1,  2,  and 
3.^  For  the  purposes  of  this  experiment,  the  tasks  can  be  considered 
to  range  from  fairly  easy  to  very  difficult.  Tasks  were  also  selected 
such  that  each  of  the  four  positions  (driver,  loader,  gunner,  and  tank 
commander)  was  covered,  and  a few  tasks  pertained  to  the  crew  at  large. 

In  consultation  with  the  military,  each  task  was  broken  down  into 
cognitive,  perceptual,  and  motor  components;  and  each  response  component 
was  examined  for  its  criticality  to  the  task.  Practical  considerations 
such  as  overall  test  running  time,  time  to  televise  each  item,  number  of 
response  components  needed  to  cover  a particular  task,  and  achieving 
a balanced  test  (see  Appendix  A)  eliminated  many  critical  response 
components.  For  each  of  the  remaining  critical  response  components  a 
television  test  item  was  conceived  and  a television  shooting  script  was 
written.  Each  item  was  televised  in  a crude  fashion  with  a handheld 
camera  and  a portable  videotape  recorder. 

The  raw  footage  was  edited  roughly  into  a prototype  television  test 
by  the  addition  of  narration  and  titles.  The  prototype  tape  was  intended 
only  as  a model  for  a professional  tape  to  be  produced  later  and  as  a 
vehicle  to  check  technical  accuracy  and  television  feasibility.^ 

Military  experts  checked  the  prototype  tape  for  technical  accuracy 
and  understandability . A revised  television  script  incorporated 
suggestions;  a final  television  tape  was  produced  using  professional 
television  personnel,  cameras,  and  editing  facilities.  The  shooting  and 
editing  of  this  final  tape  required  approximately  30  calendar  days  (about 
15  actual  working  days) . 

The  final  tape  consisted  of  47  test  items  plus  4 practice  items  and 
had  a running  time  of  53  minutes.  The  items  ranged  in  running  time  from 


^There  are  five  skill  levels  for  each  MOS  ranging  from  skill  level  1 
(beginning)  to  skill  level  5 (most  advanced) . 

^Work  on  the  preliminary  television  tape  and  the  task  selection  required 
to  produce  it  were  done  by  Human  Resources  Research  Organization  under 
contract  to  the  U.S.  Army  Research  Institute. 
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35  seconds  to  3 minutes  with  an  average  of  approximately  60  seconds. 
Ten  seconds  of  tJie  time  for  each  item  was  allotted  for  the  exasninee's 
respoi\se . 


Table  1 provides  a description  of  the  final  television  tape.  The 
categorizing  of  response  components  into  perceptual,  cognitive,  or 
motor  types  was  somewhat  intuitive.  The  intent  was  to  show  the  pre- 
dominant element  of  each  response  component  and  not  to  imply  that 
other  elements  were  not  present. 


Of  the  47  items  shown  in  Table  1,  only  37  were  administered 
to  the  examinees  in  the  experiment  and  only  30  were  scored.  Most  skill 
level  3 items  were  eliminated  before  the  experiment  upon  the  recommen- 
dation of  the  military  staff  at  the  Armor  Center.  These  items  were 
considered  too  advanced  for  the  examinees.  After  the  start  of  the 
experiment,  several  military  advisers  recommended  the  elimination  of 
six  more  items,  and  one  item  was  eliminated  due  to  a poor  television 
picture.  These  seven  items  wore  administered  but  not  scored.  The 
footnotes  in  Table  1 give  the  reason  for  the  elimination  of  any  item 
and  also  explain  why  certain  items  wore  not  included  on  the  hands-on 
test . 


A more  specific  description  of  each  response  type  shown  in  Table  1 


(1)  Multiple  choice.  The  examinee  was  required  to  select  one 
answer  from  a list  of  three,  four,  or  five  alternatives.  These  al- 
ternatives were  sometimes  the  same  as  those  in  the  usual  paper-and- 
pencil  test — namely,  words  on  the  screen--and  sometimes  consisted  of 
images  on  the  screen. 


(2)  Error  detection.  The  examinee  was  required  to  watch  a procedure 
being  performed  on  the  screen  and  to  indicate  the  time  and  location  of 
an  error,  if  one  occurred,  at  the  time  it  occurred.  The  extiminee  was 
shown  the  procedure  twice  and  responded  on  the  second  showing. 


(3)  Motor  manipulation.  The  examinee  placed  a plastic  gun  reticle 
(those  reticles  used  with  the  main  gun  in  the  M60A1  tank)  on  various 
stationary  and  moving  targ.ts  as' if  preparing  to  fire  the  main  gun.  Tlte 
reticles  were  also  used  to  simulate  the  adjustment  of  fire  that  would 
be  made  if  the  first  round  missed  the  target.  The  motor-manipulation 
response  was  supposed  to  be  a crude  simulation  of  the  actual  response 
in  aiming  the  main  gun.  However,  the  movements  required  were  so  far 
down  on  the  scale  of  response  fidelity  that  the  motor  component  appeared 
not  to  be  measured  at  all.  Perhaps  the  reticle  response  was  primarily 
perceptual  and  cognitive. 


The  paper-and-pencil  test  paralleled  the  television  test  on  an 
item-by-item  basis.  The  stimulus  input  on  this  tost  was  primarily 
printed  words,  but  some  pictures  and  drawings  were  used  on  perceptual 
items.  Table  1 shows  the  stimulus  input  for  each  item. 

As  with  the  television  test,  only  37  of  the  paper-and-pencil  test 
items  were  administered  and  only  30  were  scored.  Tlie  items  scored 
were  the  same  as  those  scored  for  the  television  test. 

Tlie  paper-and-pencil  items  and  the  television  items  differed  qreatly 
in  the  amount  of  time  allotted  to  respond  to  each  item.  Ttie  total  time 
limit  was  the  same  for  both  tests;  liowever,  examinees  could  allocate 
the  response  time  any  way  they  chose  on  the  paix?r-and-pencil  test  hut 
were  restricted  to  10  seconds  per  item  on  the  television  test. 

On  the  paper-and-pencil  tost,  examinees  could  chanqe  their  answers, 
skip  items  and  answer  later,  and  review  tlieir  answers;  on  the  tele- 
vision tost,  none  of  this  flexibility  was  permitted. 

These  differences  between  tlie  two  tests  wore  retained  because  each 
medium  lends  itself  most  readily  to  the  type  of  procedure  used.  .Any 
other  procedures  or  a common  proci dure  foi  both  tests  would  have  required 
much  more  control  and  ttioreby  reduced  administr.ation  feasibility. 


Hands-On  Performance  Test 


Tlie  hands-oti  test  was  one  routinely  administered  to  tank  crewmen 
trainees  as  a final  examination  for  the  Advanced  Individual  Traininq/ 
Armor  course.  This  test  was  given  in  the  form  of  a county  fair  with 
8 stations  and  30  performance  measures.  Examinees  were  qraded  on  a 
"go/no-go"  basis  for  each  performance  measure.  For  each  no-go,  examinees 
were  required  to  seek  out  remedial  traininq  and  report  back  later  for  a 
retest.  If  the  retest  was  a no-go  the  examinee  had  to  report  back  the 
next  day,  after  further  remedial  traininq,  for  a second  and  final  test. 
For  the  purposes  of  the  present  experiment,  the  score  recorded  for  each 
examinee  was  the  number  of  first-round  no-go's.  This  was  not  a partic- 
ularly good  criterion  because  the  number  of  no-go's  was  very  small. 


Response  Equipment 

A secondary  objectivi?  of  the  study  was  to  check  out  the  television 
response  equipment  (Telestrator) . Tliis  equipment  consists  of  a clear 
plastic  electronic  tablet  and  associated  recording  and  programing 
components.  The  electronic  tablet  covers  the  television  screen  (the 
tablet  is  approximately  S inch  away  from  the  screen  at  the  center  of 
the  screen  and  approximately  1 inch  away  at  the  edges  of  the  screen). 


The  examinee  looks  through  the  tablet  to  view  the  test  items.  Responses 
are  made  by  touching  the  face  of  the  tablet  with  an  electronic  stylus  or 
an  electronic  gun  reticle  at  a particular  time  and  location.  Before  the 
test,  the  correct  answers  (time  and  location)  are  programed  on  the  tele- 
vision tape.  During  the  test,  examinees  are  credited  with  a correct  an- 
swer if  they  touch  the  screen  at  the  correct  preprogramed  time  and  loca- 
tion. Any  other  response  by  an  examinee  is  counted  as  incorrect.  Only 
one  answer  is  permitted  for  each  time,  and  the  first  answer — correct  or 
incorrect — made  during  the  10-second  response  period  is  counted. 

The  response  equipment  was  in  prototype  form  and  because  of  opera- 
tional difficulties  could  not  be  used  for  the  experiment.  However,  if 
proved  possible  to  test  the  operating  concept  of  the  equipment  by  placing 
a human  grader  behind  each  examinee  and  having  this  observer  record  on  a 
sheet  of  paper  whether  the  examinee  touched  the  correct  location  at  the 
correct  time.  This  grading  task  was  quite  simple,  and  during  a pilot  run 
with  eight  examinees  there  were  no  difficulties  in  grading. 

The  television  monitors  were  black  and  white  and  measured  15  inches 
diagonally.  The  examinees  sat  approximately  2 feet  from  the  sets  at 
self-regulated  distances  so  that  they  could  manipulate  the  response  im- 
plements comfortably.  Prior  to  the  start  of  the  experiment,  it  was  de- 
cided to  remove  the  electronic  tablets  from  in  front  of  the  screens  be- 
cause of  parallax  problems.  After  the  tablets  were  removed,  the  accuracy 
of  the  responding  and  scoring  improved  to  a very  precise  level. 

The  response  implements  consisted  of  a stylus  used  for  all  multiple- 
choice  and  error-detection  items,  and  two  plastic  gun  reticles  used  for 
motor-manipulation  items.  The  stylus  was  simulated  by  using  the  eraser 
end  of  an  ordinary  lead  pencil.  The  two  plastic  gun  reticles,  the  same 
design  as  the  M32  and  M105D  main  gun  reticles  in  the  M60A1  tank,  were 
manipulated  by  small  wooden  knobs  glued  to  the  plastic  reticles. 


Examinees 


The  examinees  were  tank  crewmen  who  had  just  completed  the  Advanced 
Individual  Training/Armor  course.  Altogether,  134  examinees  assigned 
from  three  different  companies  were  tested.  Examinees  were  drawn  from 
the  companies  by  a selection  process  best  described  as  haphazard  rather 
than  random;  however,  there  is  no  reason  to  believe  that  selective  bias 
was  present.  As  each  group  of  examinees  arrived  for  the  experiment  for 
each  session,  the  group  was  randomly  assigned  to  the  television  or  paper- 
and-pencil  test.  Originally,  144  examinees  were  scheduled  for  the  ex- 
periment, but  2 were  lost  due  to  scheduling  problems  and  8 were  lost  due 
to  scoring  problems. 

Procedure 


Testing  was  conducted  over  a 5-day  period  in  three  morning  and  five 
afternoon  sessions.  The  actual  schedule  and  distribution  of  examinees 
are  given  in  Table  2. 
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Table  2 


Schedule  and  Distribution  of  Examinees 


Test  and 
time  of  day 

1 

O 

Da^s 

3 

4 * 

5 

Totals 

To  levision 

lb 

16 

16 

11 

1 1 

70 

Morning 

8 

8 

8 

— 

-- 

24 

Afternoon 

8 

8 

8 

11 

11 

46 

Papier-and- 

penci 1 

16 

11 

18 

11 

8 

64 

Morning 

8 

4 

10 

— 

-- 

22 

Afternoon 

8 

7 

8 

1 1 

8 

42 

Ftach  qroup  of  subjects  reported  at  1)800  or  1300  and  was  qiven  an 
orientation  session  explaininq  the  purv'ose  of  the  experiment.  All  of 
the  paper-and-penci  1 qroup  was  administered  the  papor-and-pt^nci  1 test 
riqht  after  at\  orientation.  The  television  test  was  administered  to 
four  examinees  at  a time;  the  rest  of  the  television  qroup  was  assiqned 
to  a waitinq  room.  Both  the  television  and  ttie  pap>er-and-penci  1 tests 
required  approximately  1 hour  to  complete. 

Approximately  10  minutes  of  traininq  were  required  to  teach  the 
examinees  the  methtxis  for  resjxindinq  to  the  televisiofi  items.  Most  of 
this  traininq  was  concentrated  on  the  use  of  the  p)lastic  reticle.  The 
examinees  were  trained  by  havinq  them  respond  to  the  fovir  practice  test 
items.  If  any  examinee  had  difficulty  with  the  reticles,  such  as  choos- 
ing the  incorrect  reticle  or  holdinq  reticles  incorrectly,  the  tape  w.as 
stopped  and  the  four  practice  items  presented  again.  In  no  case  was  it 
necessary  to  present  the  practice  items  more  than  twice. 
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RESULTS 

Feasibility  of  Using  Television  in  Testing 

The  examiiiees  did  not  apjx'ar  to  have  any  difficulty  in  understanding 
the  items.  All  of  the  content  had  been  covered  in  the  Advanced  Individ- 
ual Traininq  course,  and  the  examinees  had  been  tested  on  similar  items 
several  timt's.  All  of  the  items  were  also  performance  based  and  posed 
questions  that  occur  normally  in  everyday  operations. 
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Tho  responding  proceeded  smoothly  for  most  items.  Tlie  examinees 
responded  very  quickly  on  the  easy  items  (approximately  1-2  seconds 
with  tho  stylus,  3-4  seconds  with  the  reticles).  On  difficult  items, 
tlie  amount  of  response  timi'  allotted  (10  seconds)  still  appeared  am{)le, 
altliuuqh  there  usually  would  be  a lot  of  hesitating  over  th<'  answers. 

On  otUy  a few  items  the  examinees  failed  to  resixind.  Wlien  queried 
after  the  completion  of  tho  tost  about  the  aiiK'>unt  of  resjmnse  time, 
most  examinees  indicated  that  for  the  most  part  the  response  time  was 
adequate.  A few  examinees  said  that  more  response  tinx'  should  have 
been  allotted  to  sonx'  items. 

The  administration  of  tlie  television  test  was  more  time  consuming 
than  that  of  the  paper-and-penci  1 test  because  of  tlu'  not'd  to  provide 
preliminary  training  in  the  correct  way  to  r«'.si>omi  and  tlie  limit  of 
four  examinees  jn'r  session.  Ailministration  could  be  made  more  feasible 
by  increasing  the  numln'r  of  teli'vision  nxmitors,  but  it  would  still  be 
advisable  to  have  one  test  administrator  for  each  four  examinees  because 
of  the  examinees'  unfamiliarity  with  tho  ros|xinse  method.  Compared  to 
the  administration  time  for  hands-on  testing,  however,  television  testing 
is  much  loss  costly. 


Acceptance  of  Television  Testing 

The  reaction  of  the  examinees  to  tho  television  test  appeared  to  be 
quite  favorable.  Postexaminat  ion  interviews  indicated  th.at  most  ex- 
aminees actually  preferred  the  television  test  to  the  liands-on  test 
and  all  examinees  thought  the  television  test  was  fair.  Kven  when 
(juoriod  about  the  test's  being  used  as  a basis  for  promotion  or  extra 
pay,  the  examinees  still  thought  it  was  fair.  Sonx*  examinees  pre- 
ferred the  hands-on  m«xie  of  testing,  but  no  one  preferred  the  paper- 
and-poncil  mode. 

Some  reasons  mentioned  for  preferring  tho  television  mixie  follow: 

1.  Scoring  is  fairer  and  not  dependent  ujxjn  the  whims  of  the  test 
ailministrator . 

2.  Testing  is  faster  and  not  so  drawn  out. 

3.  In  ti'levislon  testing  no  one  is  shoutinti  at  you  and  ordering 
you  around. 

Some  of  the  reasons  for  preferring  the  hands-on  mode  follow: 

1.  There  is  more  t inx*  to  t hink  and  to  res^H)nd. 

2.  Testing  is  more  spread  out  and  doesn't  come  so  fast. 

3.  Television  hurt.s  the  eyes. 


Ifa 


f 


: 4| 

'll 

f \ 


4.  There  is  a chance  to  walk  around  between  items. 

The  examinees  also  indicated  tl>at  television  testing  would  be 
better  than  paper-and-p)encil  testing  because  the  questions  would  be 
more  understandable  and  require  much  leas  reading. 


Comparison  of  Television,  Paper-and-Pencil , and  Hands-On  T^sts 

The  comparison  between  the  mean  percent  error  made  on  the  television 
test  and  that  made  on  the  paper-and-pencil  test  is  shown  in  Table  3. 

The  means  for  the  television  and  paper-and-pencil  tests  do  not  differ 
to  any  great  degree,  indicating  that  the  difficulty  levels  of  the  two 
tests  are  fairly  equal. 


Table  3 

Mean  Percent  Error  Made  on  the  Television  and  Paper-and-Penci 1 Tests 


Test  and 

Days 

time  of  day 

1 

2 

3 

4 5 

Mean 

Television 

Morning 

Afternoon 

19.63 

27.88 

15.00 
29.  38 

14.75 

28.00 

26.09  27.09 

16.46 

27.54 

Unweighted  mean 

22.00 

Paper-and-pencil 

Morning 

Afternoon 

28.38 

27.13 

20.75 

24.71 

22.3 

19.5 

29.90  27.38 

24.23 

26 . 05 

Unweighted  irx'an 

25.14 

One  interesting  facet  of  the  data  is  that  afternoon  television 
examinees  made  many  more  errors  than  the  morning  groups.  These  results 
are  convincing  because  they  are  consistent  across  the  first  3 days  of 
the  experiment  and  because?  the  afternoon  means  for  Days  4 and  5 are 
consistent  with  the  other  afternoon  means.  There  does  not  appear  to  bo 
any  morning-afternoon  effect  for  the  paper-and-pencil  test. 

The  analysis  of  variance  using  the*  unweighted  means  analysis  for 
ur>equal  cell  frequencies  (Winer,  1962)  is  shown  in  Table  4.  This  anal- 
ysis shows  no  difference  between  the  television  test  and  the  paper-and- 
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poiicil  tost  ill  terms  of  item  difficulty.  Tiiere  was  a siqnificant 
morninq-aftornooii  effect,  but  tiio  more  meaiiinqful  result  is  the 
siqnificant  mean  square  (MS)  interaction.  Analysis  of  this  MS  inter- 
action reveals  that  the  morninq-afternoon  effect  is  concentrated  on 
the  television  test  and  not  on  the  paper-and-pencil  test. 

In  order  to  check  on  whether  tlie  afternoon  examinees  may  liave  been 
less  qualified  than  the  morninq  examinees,  the  first  round  no-qo's 
from  the  hands-on  t<'St  wt're  analyzeii.  Tliese  results  are  shown  in  Table 
5.  Inspection  of  the  means  indicates  little  difference  between  the 
television  and  paper-and-pencil  groups,  or  between  the  morninq  and 
afternoon  groups.  If  anytliinq,  the  afternoon  group  performed  slightly 
better  than  the  morninq  group.  An  analysis  of  variance  of  these 
results  showed  no  siqnificant  difference  for  any  of  the  variables. 


Table  4 

Analysis  of  Variance  for  Television  and  Pape r-and-Ponci 1 Tests 


Source 

df 

MS 

F 

P 

TV  vs.  P 1.  P (method) 

1 

9 . 50 

1.04 

ns 

Morning  vs.  afternoon  (session) 

1 

124.57 

1 3 . 60 

. .01 

Method  X session 

1 

58 . 06 

6.40 

• .02 

Within  cell 

130 

9.16 

Although  overall  scores  on  tlie  television  and  paper-and-pencil 
tests  did  not  differ,  tiiere  might  be  differences  among  the  various 
items.  Accordingly,  the  items  were  grouped  by  response  type  (multiple 
choice,  error  detection,  and  motor  manipulation)  and  log  linear 
Chi-square  tests  (Shaffer,  1973)  wi're  computed  for  each  item.  Table  b 
shows  that  there  was  a wide  variation  of  difficulty  among  the  items 
ranging  from  10  to  811  error.  Por  the  multiple  choice-items,  there 
was  little  difference  between  the  television  and  paper-and-pencil 
versions.  Only  1 of  13  items  showed  a siqnificant  difference.  For 
the  error-detection  items  there  was  a substantial  difference,  witli 
six  out  of  nine  items  sliowing  a significant  difference. 
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Tdblt*  5 


Moan  Percent  Krror  Maiie  by  the  Television  and 
Paf)t^r-and-Penci  I Croups  on  t be  llaiuls-l'n  Tt'st 


Test  and 
timi'  of  day 

1 

2 

Days 

' 5 

Mean 

Television 

Morning 

. 00 

t> . 00 

‘).52  

— 

«.  10 

Afternoon 

0 . 00 

0 . 00 

0.52  0.50 

2 . 9U 

0.52 

Unweighted 

mtMn 

7.T4 

Paper-and-penci 1 

Morning 

1 2 . 00 

f. . 00 

« . 00  

— 

9 . 0» 

Afternoon 

7.52 

9.72 

5.52  12.00 

0.52 

rt.4« 

Unweight  ed 

moan 

h’.'7H 

It  is  interesting  to  note  that  errors  of  commission  are  more  diffi- 
cult to  detect  oil  television;  whereas  errors  of  omission  and  no-error 
items  are  more  difficult  to  detect  on  pa{H!r-and-penci 1 . Three  of  the 
eight  motor-manipulation  items  show  si>me  significant  difference,  and 
all  three  of  these  items  show  more  difficulty  for  the  television  test. 
The  net  result  of  this  item  difficulty  analysis  shows  five  items  more 
difficult  on  television  tests  and  five  items  more  difficult  on  paper- 
and-pencil  tests.  This  canceling  effect  is  reflected  in  the  overall 
nonsignificant  difference  between  the  television  tt'St  and  the  paper- 
and-pencil  test. 

The  last  analysis,  in  Table  7,  shows  the  correlations  of  the  hands- 
on  lest  with  the  pajier-and-penci  1 test  and  the  television  test.  Those 
correlations  are  also  broken  down  for  the  mtirning  and  afternoon  groups. 
There  is  a low  positive  correlation  between  the  television  and  liands-on 
tests  and  also  between  the  paper-and-pencil  and  hands-on  tests.  The 
pai>er-and-penci 1 correlation  is  significantly  different  from  zero, 
how»'Ver,  there  is  no  significant  difference  between  the  television 
versus  the  h.ind^-on  and  the  paper-and-pencil  versus  hands-on 
correlations.  The  breakdown  for  morning  and  afternoon  groups  shciws  a 
•omewhat  higher  i>ositive  correlation  for  the  afternoon  group  and  very 
little  correlation  for  the  moining  group.  Once  again,  there  is  no 
significant  difference  between  the  television  and  paper-and-pencil 
correlations  with  the  hands-on  test. 
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Table  6 


Comparison  Between  Television  and  Paper-and-Penci 1 
Items  Percent  Krror  Arranged  by  Response  Typos 


Response  type  and 
item  number 

TV 

% error 

P & P 

x2 

Multiple  choice 

1 

1 

0 

ns 

2 

0 

2 

ns 

4 

43 

33 

1.43 

5 

21 

14 

.82 

6 

38 

72 

13.83 

7 

47 

52 

ns 

a 

26 

20 

ns 

9 

16 

12 

ns 

10 

17 

16 

ns 

11 

3 

5 

ns 

12 

0 

0 

ns 

17 

1 

3 

ns 

32 

81 

77 

ns 

Error  detection 
(commission) 

3 

51 

28 

7.47' 

13 

11 

11 

ns 

14 

40 

12 

11.09 

37 

46 

56 

1.42 

Error  detection 
(omission) 

16 

19 

52 

14.88 

la 

4 

47 

18.90 

23 

13 

45 

11.26 

Error  detection 
(no  error) 

15 

16 

19 

ns 

20 

17 

66 

30.36 

Motor  manipulation 
(reticles) 

19 

46 

16 

12.24 

21 

51 

17 

16.17 

24 

4 

5 

ns 

25 

27 

11 

3.75 

26 

11 

2 

ns 

27 

14 

11 

ns 

28 

23 

34 

1.45 

30 

23 

28 

ns 

Table  7 

Correlations  Between  the  Television  and  Pai>er-and-Benci 1 
Scores  and  the  Hands-On  Performance  Scores 


Time  of  day 

and  tests  correlated 


Overall 


Television  vs.  hands-on 


r P 


.24  ns 


Paper-and-pencil  vs.  hands-on  .33  <.01 


Morninc 


Television  vs.  hands-on 


- . 09  ns 


Paper-and-pencil  vs.  hands-on  .10  ns 


difference  is  nonsiqni f icant 


difference  is  nonsiqni f icant 


Afternoon 

Television  vs.  hands-on  .47  ,.01  fferenco  is  nonsinni f icant 

Paper-and-pencil  vs.  hands-on  .40  < .01 


CONCLUSIONS  AND  DISCUSSION 

Tile  results  from  this  research  indicate  that  it  is  possible  to  produce 
a synthetic  test  using  television  as  the  stimulus  input.  The  examinees 
can  understcind  the  problems,  make  proper  responses,  and  accept  the  test 
as  "fair"  for  career  evaluation. 


The  experience  gleaned  from  the  production  and  administration  of  this 
prototype  test  indicates  that  television  testing  is  more  costly  than 
paper-and-pencil  testing  but  far  less  costly  than  hands-on  testing.  The 
production  of  the  tape,  from  conception  to  final  editing,  required  several 
months  and  used  the  services  of  a substantial  number  of  professional 
people.  Television  tests  are  also  somewhat  inflexible,  not  only  in  the 
difficulty  in  effecting  changes  in  the  tost,  but  also  in  the  timing 
decisions — the  amounts  of  time  to  allot  for  posing  each  question  and  for 
each  response — that  have  to  bo  made  before  the  production  of  the  test. 

Television  testing  will  have  a much  more  promising  future  if  a 
presentation  and  response  device  can  be  designed  which  will  pemiit  tlie 
examinee  to  advance  to  the  next  item  as  soon  as  the  present  one  is 
answered,  to  see  the  same  item  twice,  to  change  answers  to  an  item,  and 
to  review  the  entire  test.  Such  a capability  would  permit  the  flexibility 
of  presenting  multipart  items,  such  as  in  troubleshooting  and  would  per- 
mit the  presentation  of  multimedia  items,  such  as  using  both  television 
and  technical  manuals  in  the  same  item. 
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The  present  exixsriment  provides  evidence  that  television  testing 
is  highly  acceptable  to  the  examinees.  Their  predominant  attitude  was 
that  the  test  was  little  different  from  the  hands-on  tests  in  the  Ad- 
vemced  Individual  Training  course,  except  that  television  was  quicker 
and  less  subject  to  scoring  error.  All  of  the  scenes  were  quite  famil- 
iar to  the  exeuninees,  and  the  items  were  ones  that  they  had  been  study- 
ing for  8-13  weeks. 

Television  used  in  the  multiple-choice  format  appears  to  offer  no 
advctntage  over  slide  or  paper-and-pencil  formats.  Before  the  experi- 
ment, it  was  felt  that  television  would  offer  an  advantage  for  those 
items  in  which  motion  was  an  integral  part  of  the  stimulus.  For  exam- 
ple, Spangenburg  (1973)  has  shown  that  watching  a television  display  of 
a procedure  involving  motion  leads  to  more  learning  than  watching  a 
sequence  of  still  shots.  However,  this  advantage  of  motion  proved  to 
be  true  for  one  motion  item  in  the  present  research  (item  6,  Table  6) 
but  not  true  for  two  items  (5  and  7) . Perhaps  if  more  motion-type 
items  had  been  included  in  the  multiple-choice  category  an  advantage 
might  have  been  shown. 

In  the  error-detection  category  there  did  appear  to  be  a clear-cut 
difference  between  television  and  paper-and-pencil  items.  Here  the 
fidelity  of  the  stimulus  did  seem  to  play  a role,  and  the  enriched 
stimulus  of  the  television  picture  may  have  presented  cues  to  the  ex- 
cuninees.  The  two  error-detection  items  that  proved  to  be  more  diffi- 
cult for  television  examinees  (items  3 and  14,  Tables  1 and  6)  were 
two  of  the  first  error-detection  items  to  be  presented.  Since  error 
detection  was  an  unfamiliar  response  for  the  examinees,  this  unfamil- 
iarity may  have  caused  some  difficulty.  This  same  phenomenon  can  be 
seen  in  the  motor-manipulation  items  which  involve  an  even  more  unfa- 
miliar response.  Here  t))e  television  ex^lminees  had  more  difficulty 
with  the  first  few  items  than  did  the  paper-and-pencil  exciminees. 

The  correlations  between  the  synthetic  and  the  hands-on  tests  are 
too  low  to  warrant  recommimding  the  substitution  of  synthetic  for  hands- 
on  tests.  However,  the  correlations  for  the  afternoon  groups  are  high 
enough  to  encourage  further  research.  The  h^u^ds-on  criterion  test  used 
in  the  present  experiment  was  somewhat  unsatisfactory  because  of  the 
large  number  of  perfect  scores. 

The  drop  in  the  scores  on  the  television  test  for  the  afternoon 
group  as  conpared  to  the  morning  group  was  interesting  but  unexpected. 
One  possible  explanation  for  the  drop  may  bo  that  the  examinees  were 
required  to  stare  continuously  at  a fairly  large  television  screen  from 
a very  close  distance  for  approximately  1 hour.  A human  b«'ing  may  be 
able  to  tolerate  this  strain  in  the  morning  but  by  the  afternoon  accumu- 
lated fatigue  plus  a heavy  Army  lunch  may  have  combined  with  the  strain 
to  produce  a letdown. 
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The  results  of  this  study  favored  the  feasibility  of  television 
testing.  The  tests  can  be  produced  at  a reasonable  cost,  can  be 
administered  in  a reasonable  manner,  are  understandable  by  the  examinees, 
and  have  high  acccptcibility  with  the  examinees. 

The  validity  of  the  results  was  inconclusive.  The  criterion  scores 
for  the  hands-on  test  were  unsatisfactory  in  that  most  hands-on  examinees 
made  a perfect  score.  The  correlation  between  the  television  and  hands-on 
tests  was  low  positive  but  nonsignificant.  Comparison  between  parallel 
television  and  paper-and-pcncil  tests  also  showed  no  difference  on  an 
overall  basis,  although  there  were  significant  differences  between 
many  items. 

Evidence  from  this  study  is  insufficient  to  conclude  that  synthetic 
performance  tests  with  television  inputs  can  be  used  to  replace  hands-on 
performance  tests. 
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APPFNDIX  A 


l.IMITKU  GUIDKLINKS  TOR  THF  DEWLOPMKNT  OF  TELEVISION  TESTS 

Very  little  evidence  is  available  as  to  the  best  way  to  present 
test  items  on  television.  The  only  published  research  for  military 
tests  is  Shriver's  (Shriver  et  al.,  1974)  and  as  noted  before,  the 
conclusions  from  this  research  were  very  neqative.  Most  decisions 
made  for  the  present  tost  were  based  on  paper-and-oenci 1-test , 
developmt'nt , export  opinion,  and  experience.  Very  tew  hard-aiid-fast 
quide lines  can  be  offered  because  so  many  decisions  depend  upon  the 
format  chosen,  the  type  of  questions,  and  the  amount  ot  time  avaiiaoie. 

An  important  limitinq  factor  in  the  develoinnent  of  television 
tests  is  the  amount  of  runninq  time  available  for  each  test  item  and 
for  the  complete  test.  Ttie  maximum  desirable  time  for  a television  test 
such  as  the  present  one  is  50-(->0  minutes  for  a number  of  reasons, 
includinq  eyestrain,  general  fatigue,  and  administrative  cost.  One 
advantage  of  television  testing  over  hands-on  testing  is  the  low  admin- 
istrative cost  per  examinee.  The  longer  tt>e  television  test,  the  less 
the  advantage. 

Although  experience  with  television  testing  is  too  limited  to  offer 
much  iti  the  way  of  guidelines,  it  may  Ix'  useful  to  describe  tlie  develop'- 
mental  stages  and  some  of  the  difficulties  encountered. 

Prior  to  the  development  of  the  test  it  was  decided  to  aim  for  a 
50-ti0  minute  running  time,  to  cover  the  MOS  of  tank  crewmen  at  skill 
levels  1,  2,  and  3 and  the  job  positions  of  driver,  loader,  gunner,  and 
tank  commander.  The  test  was  to  be  a group  test  with  individual  TV 
screens  and  the  examinees  were  to  respond  to  the  items  by  touching  ttie 
face  of  the  television  screen  with  a stylus  or  reticle. 

The  first  selection  step  was  to  ask  various  military  training 
oepartments  (gunnery,  automotive,  and  such)  to  submit  a list  of  critical 
tasks  which  should  be  tested.  These  departments  submitted  a total  of  75 
tasks.  Because  only  a limited  numlx'r  of  tasks  could  be  used  on  the  final 
tape,  the  list  had  to  be  pared  down  considerably.  Many  tasks  were  elim- 
inated in  order  to  balance  the  test  among  skill  levels  and  crew  oo^sitions. 
For  exami'ile,  40  of  the  tasks  received  from  the  departments  wore  for  skill 
level  3 and  only  5 of  these  tasks  were  on  the  final  taix*.  Most  remaining 
excess  tasks  were  eliminated  simply  by  deciding  to  limit  the  test  to  tasks 
associated  with  the  actual  operation  of  the  tank.  Critical  areas  such  as 
drug  abuse,  first  aid,  leadership,  and  tactical  decisions,  and  complex 
tasks,  such  as  sketching  an  area  map  and  tasks  that  required  excessively 
long  television  running  times,  were  eliminated. 

The  first  step  in  developing  specific  test  items  for  each  task  was 
to  list  all  response  components  making  up  a task  and  decide  whetlier  each 
response  component  was  primarily  cognitive,  pe*'ceptual,  or  motor.  Each 
cognitive  or  perceptual  re  ise  component  was  then  examined  for  criticality 


and  feasibility  for  television  testing.  Depending  upon  the  numl^er  of 
critical  and  feasible  response  components  making  up  a task,  a decision 
was  made  to  include  one  or  more  test  items  for  the  task  (the  numt)er  of 
test  items  per  task  ranged  from  one  to  six  as  shown  in  Table  1) . Primarily 
cognitive  response  components  were  tested  as  error-detection  and  multiple- 
choice  items.  Primarily  perceptual  components  were  tested  as  motor- 
manipulation  and  multiple-choice  items. 

Some  of  the  trial-and-error  observations  that  can  be  made  for  each 
response  type  are  as  toilows: 

1.  Multiple  choice.  Items  of  this  sort  are  very  simple  to  conceive  and 
develop  and  require  very  little  test  time  (about  30-45  seconds)  provided 
choices  are  presented  simultaneously  such  as  on  a four-way  split  screen 
or  four  words  on  the  screen.  Presenting  the  clioices  serially  creates 
difficulties  not  only  in  terms  of  greatly  increased  running  time  but  also 
because  the  examinees  often  forget  the  first  choice  by  the  time  they  see 
t)ie  last  one.  Either  the  choices  have  to  be  presented  twice  (responding 
occurs  on  the  second  presentation)  or  the  examinees  must  respond  "yes"  or 
"no"  to  each  choice  as  it  appears.  Neither  metliod  is  very  satisfactory. 

Rationally,  presenting  multiple-choice  items  on  television  does  not 
offer  too  mucti  advantage  over  a paper-and-pencil  format  except  in  terms 
of  reducing  the  need  for  reading  and  perhaps  presenting  a more  easily 
understood  item.  For  example,  the  motion  and  sounds  associated  with 
television  may  be  helpful  in  understanding  the  item. 

2.  Error  detection.  This  response  type  has  been  criticised  liarshly  by 
Shriver  (see  Page  7) , and  there  are  other  difficulties  as  well.  One 
major  ditticulty  is  in  producing  the  item  (televising  the  procedure 
accurately).  If  a no-error  item  is  desired,  it  is  necessary  to  find  an 
actor  who  can  carry  out  the  procedure  without  error.  All  too  often, 
expert  advisers  cannot  agree  on  the  correct  procedure.  Many  repetitions 
of  each  scene  have  to  be  made  before  the  experts  and  actors  can  reach 
some  sort  of  compromise,  and  even  then  there  remain  logical  and  inherent 
difficulties  which  cannot  be  resolved.  For  example,  in  televising  items 
for  the  load  round  into  main  gun  task  it  was  necessary  to  choose  between 
showing  the  action  at  normal  speed  or  in  slow  motion.  When  the  action 
was  shown  at  normal  speed,  no  examinee  could  discriminate  the  crucial 
element  (hand  position)  and  the  item  had  no  meaning.  When  the  action  was 
shown  in  slow  motion  (so  the  crucial  element  could  be  seen) , examinees 
criticized  the  slowness  itself  as  an  error. 

Another  major  difficulty  with  error-detection  items  concerns  very 
slight  deviations  from  prescribed  procedure  which  often  escape  the 
scrutiny  of  expert  advisers.  Exceptionally  well-skilled  examinees  may  be 
lured  into  pointing  to  the  slight  deviations  as  errors,  while  the  less 
skilled  never  notice  t)io  slight  deviations  and  point  to  tl^e  major  intended 
error.  This  was  particularly  true  for  the  response  format  used  in  t)ie 
present  study,  where  the  first  response  made  by  the  examinee  was  scored 
and  all  subsequent  responses  to  the  same  item  were  ignored. 
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The  error-detection  response  typo  was  included  to  test  the  examinees' 
knowledqe  of  incorrect  as  well  as  correct  actions.  Many  incorrect  actions 
occur  very  infrequently  but  can  be  very  serious  when  they  do  occur. 
Performing  the  correct  action  in  a hands-on  test  does  not  necessarily 
indicate  awareness  of  danger  points,  and  examinees  need  to  Ix'  tested 
directly  as  to  awareness  of  incorrect  actions.  However,  as  Shriver 
points  out,  watching  someone  else  perform  is  very  different  from  doing 
it  yourself.  Failing  to  notice  an  error  may  indicate  lack  of  knowledge  or 
it  may  indicate  inability  to  notice  error  in  others. 

The  overall  conclusion  is  that  error  detection  is  a doubtful  response 
type  and  more  thought  and  research  are  needed  prior  to  its  .acceptance  as 
a useful  procedure. 

3.  Motor  manipulation.  This  response  type  was  rather  specific  to  this 
particular  study  and  the  response  equipment  being  evaluated.  In  fact,  one 
strong  selling  point  of  the  response  equipment  was  its  provision  for 
testing  the  motor-manipulation  items.  All  test  items  under  this  particular 
response  type  pertained  to  where  the  examinee  should  place  the  reticle  on 
the  television  screen  when  simulating  firing  the  m.ain  gun  under  various 
conditions.  However,  analysis  reveals  that  this  response  typo  is  not 
really  a test  of  motor  ability,  hut  rather  a test  of  a combination  of 
perceptual  and  cognitive  abilities.  The  cognitive  element  was  knowing 
the  correct  lead  and  elevation  for  each  target  and  the  perceptual  element 
was  being  able  to  discriminate  the  correct  lead  and  elevation.  There  is 
no  evidence  to  indicate  that  the  ability  to  manipulate  a plastic  reticle 
on  a television  screen  has  any  correlation  with  the  motor  element  involved 
in  aiming  an  actual  gun.  On  the  plus  side,  this  response  type  is  more  of 
a recall  item  than  a multiple-choice  questioii  and  ttierefore  should  provide 
a more  exact  measure  of  recall.  On  the  minus  side  is  the  requirement 
to  learn  a new  response  quickly  (manipulating  plastic  reticles) . Incorrect 
responses  may  be  caused  by  lack  of  knowledge  or  perceptual  ability,  or 
merely  by  failure  to  master  the  new  response  of  manipulating  plastic 
reticles. 

This  response  type,  like  the  error-detection  response  type,  needs 
much  more  thought  and  research  prior  to  acceptance  as  a useful  procedure. 

Some  comments  on  a few  miscellaneous  topics  may  also  be  useful: 

1.  Use  of  a time  period  to  indicate  error.  One  item  (3,  Tables  1 
and  6)  on  the  television  test  used  tlie  passage  of  time  as  the  cue  for 
the  examinee  to  note  an  error.  That  is,  tlie  actor  waited  only  5 seconds 
before  continuing  a procedure,  where  the  prescribed  procedure  in  the 
technical  manual  calls  for  a 120-l.H0-second  wait.  Some  criticism  has 
indicated  that  this  time-p.assage  technique  night  confuse  examinees 
because  Americans  have  been  conditioned  through  exposure  to  motion 
pictures  and  television  to  accept  any  length  time  period  shown  on  a 
screen  as  the  appropriate  tim<'.  Item  3 did  seem  to  confuse  many  examinees. 


2.  Long  items.  Several  items  on  the  television  tape  had  relatively 
long  running  times  (approximately  3 minutes) . Some  critics  claim  that 
including  such  long  items  may  be  unwise  because  coverage  cf  the  total 
subject  matter  is  restricted  at  best,  given  a 50-60  minute  time  limit 
for  the  test.  Although  a long  item  may  not  necessarily  confuse  the 
examinees,  it  is  noteworthy  that  3 out  of  4 long  items  retained  in  the 
test  did  prove  very  difficult  for  the  television  excuninees  and  all  4 

of  the  long  items  omitted  from  the  study  appeared  to  be  confusing 
during  pilot  runs.  Another  reason  for  omitting  long  items  is  the 
difficulty  in  getting  an  actor  to  perform  a long  sequence  in  letter- 
perfect  fashion.  One  very  long  item  on  the  tape  (evacuate  injured 
crewman)  was  never  completely  satisfactory.  The  final  take  was 
accepted  because  the  director  became  concerned  with  the  safety  of  the 
actor  playing  the  role  of  injured  crewman. 

3.  Re solution . Unlike  the  human  eye,  television  cannot  capture 
both  a wide-angle  view  and  good  resolution  at  the  same  time.  For 
scenes  that  require  good  resolution  it  is  a good  idea  to  zoom  in  on  a 
scene  and  remain  there.  To  attempt  to  show  more  than  one  closeup  in 
any  one  sequence  tends  to  confuse  the  viewer. 

4.  Restricted  view.  Even  with  a wide-angle  lens,  television  gives 
a very  restricted  view  and  care  must  be  tadcen  to  provide  setting  shots. 
Precise  judgments  as  to  the  placement  of  controls  are  difficult  to 
make  from  a television  picture. 

5.  Poor  depth  perception^.  Much  depth  perception  is  lost  in  a 
television  picture.  Items  that  depend  upon  judgment  of  depth  should 
be  omitted. 

6.  Closeup  and  motion.  Any  kind  of  motion  in  a closeup  shot  is 
confusing.  Necessary  movements  must  be  very  slow  and  precise.  However, 
it  should  be  noted  that  slowness  is  perceived  as  an  error  by  many  people. 

More  research  needs  to  be  accumulated  before  a more  precise  set  of 
guidelines  Ccin  be  produced  for  television  testing.  Particularly  needed 
is  development  of  a more  adequate  stimulus  presentation  and  response- 
recording  device.  Also  needed  are  researchers  well  grounded  xn  the 
capabilities  and  limitations  of  television  and  the  use  of  television 
Ccuneras,  lighting,  and  editing  equipment.  Television  testing  offers 
much  potential  but  before  this  potential  can  ’ reached,  much  preparatory 
work  remains  to  be  done . 
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APPENDIX  B 


EXAMPLES  OF  PAPER-AND-PENCIL  ITEMS 


A.  Multiple  choice 


12.  You  are  the  driver  of  an  M60A1  tank.  What  response  do  you  make  to 
the  following  ground  guide  signal  given  at  night  with  a flashlight? 


a.  Move  backward 


b.  Start  engine 


d.  Turn  left 


6.  You  are  the  loader  in  an  M60A1  tank 
following  fire  command: 


The  tank  commander  gives  the 


GUNNER,  BEEHIVE  TIME,  TROOPS,  ONE  SIX  HUNDRED 


The  firing  switch  has  been  checked  and  the  breech  is  open 
six  steps  in  order: 


(1)  Select  a BEEHIVE  round. 

(2)  Insert  the  round  2/3  of  the  way  into  the  chcunber. 

(3)  Push  the  round  into  the  chamber  with  the  heel  of  the  right  hand 

(4)  Clear  the  path  of  the  recoil. 

(5)  Turn  the  firing  switch  to  FIRE. 

(6)  Announce  "UP." 


Did  you  do  anything  wrong? 


a.  Step  (2)  is  wrong 


b.  Step  (3)  is  wrong 


c.  A step  is  missing 


Motor  Maniimlatioti  (Reticle  Manipulation) 


FOR  THE  NEXT  FaiR  I'ROBLEMS  ENGAGE  ALL  STATIONARY  TARGETS  AT  CENTER  OF  MASS 
AND  ASSUME  ALL  MOVING  TARGETS  APE  TRAVELING  AT  15  MPH. 


23.  You  are  the  quuuer  on  an  M60A1  tank.  Tlie  tank  conmandor  gives  the  fire 
command : 


GUNNER,  SABOT,  TANK 


Wliich  of  the  following  sight  pictures  would  you  take  up  using  the  periscope 
ret icle? 


AprKNinx  c 


KVAI.UATION  OK  THK  TKLESTRATOR  KQlUrMFNT^ 

Ono  of  tho  reasons  the  t^ilevision  tost  was  desiemed  and  produced 
was  to  evaluate  tho  Tolestrator  oqvupmont  (also  known  as  tho  Telestar 
equipment).  Tlio  novel  component  of  tliir.  equipment  is  an  electronic 
tablet  wliich  can  bo  fitted  over  the  face  of  a television  screen.  The 
tablet  will  record  tlie  horizontal  atid  vertical  (XY)  location  wtien  it 
is  touclied  with  an  electronic  contact  point  embedded  in  a stylus  or 
similar  indicator  (such  as  a qunsiqht  reticle) . By  the  pro|H'r  use  of 
auxiliary  recordinq  equipment  it  is  possible  to  i ecord  tlie  place  and 
time  the  screen  is  touched.  The  recordinq  equipment  includes  a counter 
wliicti  keeps  a runninq  total  of  the  numlier  of  items,  numlier  of  answers 
attempted,  and  number  of  correct  answers. 

Tho  complete  system  includes  tlie  electronic  tablet,  a proqramit\q 
unit,  and  several  student  units.  The  proqramminq  unit  is  used  by  the  test 
developer  to  place  electronically  on  the  television  tape  the  XY 
coordinates  for  tho  correct  at\swer  for  each  test  item  and  the  t ime 
period  durinq  whicli  tho  equipment  will  accept  this  answer.  T)ie  student 
unit  compares  electronically  the  proqramed  answer  aud  the  examinee's 
answer  and  records  the  results. 

The  student  ui’.it  provides  three  types  of  feedback  to  the  examinee 
for  each  test  item.  Immediate  feedback  is  provided  by  a hiqli-pitclied 
tone  and  a small  red  liqht  ttiat  comes  on  for  a correct  answer  versus 
a low-pitched  tone  and  no  liqlit  for  an  incorrect  answer.  Sliqhtly 
delayed  feedback  comes  from  a counter  w)iicti  sliows  new  totals  of  items 
and  correct  answers  at  the  end  of  tlie  proqramed  time  period  for 
answerinq  each  problem. 


As  to  whether  the  Telestrator  equipment  has  any  merit  or  not , it 
is  necessary  to  examine  both: 

1.  Tlie  equipment  itself,  as  dosiqned  and  produced,  and 

2.  The  overall  testing  strategy  wliich  includes  (a)  Individual 
responding,  (b)  Television  stimulus,  (c)  Immediate  feedback, 
and  (d)  Time  limit  on  eacti  resixjnse. 


The  Equipment 


As  with  most  newly  designed  equipment,  tlie  Ti'lestrator  contained 
many  bugs  and  never  worked  properly.  However,  it  was  |>ossible  to  tost 
some  aspects  of  the  equipment  by  using  human  graders  to  record  right 
or  wrong  answers  by  the  examinees  according  to  ttie  time  and  place  the 
screen  was  touched.  Several  pilot  teats  were  run  with  the  following 
results. 

‘*This  summary  of  Telestrator  operation  was  sulimitted  previously  to  the 
Training  Support  Division,  TRADOC. 


(1)  Accuracy.  There  is  a fundamental  flaw  in  the  Tolestrator 
design  insofar  as  precise  resixjnding  is  concerned.  The  equiiiment  was 
claitiK'd  to  bo  accurate  to  1/4  inch.  However,  due  to  parallax  the 
actual  accuracy  was  more  on  the  order  of  1 or  2 inches.  This  qross- 
ness  effectively  eliminated  the  use  of  the  reticle  test  items  because 
with  any  reasonable  size  reticle  no  discrimination  was  possible  for 
leads  or  ranges.  The  grossness  also  eliminated  many  tost  items  in 
which  tl)c  examinee  was  required  to  discriminate  among  several  tank 
controls.  The  si>acing  between  these  controls  as  shown  on  the  screen 
was  not  groat  enough  to  permit  exact  programing  of  the  answers,  and 
one  answer  box  would  overlap  another.  The  parallax  results  from 
mounting  the  electronic  tablet  at  some  distance  from  the  actual 
television  screen  (due  to  curvature  of  the  television  screen  the  parallax 
increases  as  one  approaches  the  edge  of  the  screen) . 

In  order  to  continue  the  experiment  and  test  the  idea  of  responding 
to  television,  the  electronic  tablets  wore  removed  from  the  television 
monitors  and  the  examinees  were  required  to  touch  the  face  of  the 
actual  television  screens.  This  completely  eliminated  all  parallax  and 
permitted  the  use  of  reticle  items  and  other  precision  responding. 

(2)  Video  presentation.  The  electronic  tablet  is  constructed  in 
sucti  a manner  that  it  blocks  a 1-inch-wido  area  around  the  outer  edge  of 
the  television  screen.  This  is  a serious  limitation  because  it  is 
necessary  to  use  a small-size  monitor  for  such  closeup  work  and  this 
outer  1 inch  covers  a substantial  part  of  the  available  screen  area. 


B.  Testing  Strategy 

Because  of  the  device's  failure  to  work  properly  and  the  poor  design 
of  the  electronic  tablet,  it  was  not  possible  to  evaluate  the  testing 
strategy  completely.  However,  by  eliminating  the  parallax  (removing 
the  electronic  tablets)  and  using  human  graders  to  record  responses, 
it  was  possible  to  make  a limited  test  of  the  strategy. 

(1)  Responding  to  television.  The  examinees  seemed  to  have  a little 
trouble  understanding  the  test  items,  and  responded  very  precisely. 

Throe  typos  of  test  items  were  used;  Multiple  choice.  Error  detection, 
and  Reticle  manipulation. 

No  training  other  than  instructions  was  required  for  learning  to 
respond  to  the  multiple-choice  and  error-detection  items.  Approximately 
10  minutes  were  required  for  training  on  the  reticle-manipulation  items. 

(2)  Time  to  respond.  Ton  seconds  were  allowed  for  responding  to 
each  item.  Tlie  time  limit  was  generous  and  most  examinees  responded  to 
most  items  in  less  than  5 seconds. 

(3)  Perception  of  test.  The  examinees  perceived  the  tost  as  being 
"fair"  and  most  actually  preferred  the  television  test  over  the  hands-on 
test . 
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(4)  Comparison  with  other  tests.  The  television  test  was  compared 
to  parallel  paper-and-pencil  and  hands-on  tests, 

(a)  Paper-and-pencil  test.  Overall  there  was  little  difference 
between  the  mean  scores  on  the  television  and  the  paper-and-pencil  tests. 
However,  on  an  item-by-itom  basis  there  was  considerable  difference  for 
some  items.  On  error  detection,  the  television  scores  were  much  better; 
on  reticle  manipulation,  the  television  scores  were  worse. 

(b)  Hands-on  test.  There  was  a low  positive  correlation  between 
the  television  test  and  the  hands-on  test.  This  correlation  was  loss  tlian 
desirable. 

The  experiment  was  too  limited  to  permit  any  conclusions  at  this  time 
with  reference  to  the  reliability  and  validity  of  the  above  results. 

(5)  Feedback.  Because  the  examinees  were  tested  four  at  a time  and 
because  the  Telestrator  equipment  was  not  workinq,  it  was  not  possible 
to  provide  immediate  feedback  after  each  item. 

(6)  Eye  fatigue.  The  television  test  and  the  responding  mode 
required  the  examinees  to  stare  continually  at  the  television  monitor. 
There  were  many  complaints  of  eyestrain  and  there  is  some  evidence  that 
the  afternoon  television  examinees  performed  more  poorly  than  the  morning 
television  examinees. 


Conclusions  and  Recommendations 

(1)  The  Telestrator  equipment  as  presently  designed  should  be  reiected 
because  of  the  parallax  problem. 

(2)  The  television  method  appears  to  offer  enough  promise  to  warrant 
the  testing  of  other  response  devices  which  do  not  have  t lie  p.irall.ix 
problem. 

(3)  There  are  many  unknowns  in  television  testing  and  the  overall 
testing  strategy,  and  the  research  effort  needs  to  Ih->  gi,-..tlv  •'vnandi'd 
such  as: 

(a)  A more  definitive  comparison  with  hands-on  tests. 

(b)  Research  on  the  "immediate  feedback"  idea. 

(c)  Using  alternative  response  devices. 

(d)  Comparison  with  slide-tape  devices. 

(o)  Further  research  on  eye  fatigue. 
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